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Abstract. Formal Concept Analysis (FCA) is a mathematical theory based on 
the formalization of the notions of concept and concept hierarchies. It has been 
successfully applied to several Computer Science fields such as data mining, 
software engineering, and knowledge engineering, and in many domains like 
medicine, psychology, linguistics and ecology. For instance, it has been exploited 
for the design, mapping and refinement of ontologies. In this paper, we show 
how FCA can benefit from a given domain ontology by analyzing the impact of a 
taxonomy (on objects and/or attributes) on the resulting concept lattice. We will 
mainly concentrate on the usage of a taxonomy to extract generalized patterns 
(i.e., knowledge generated from data when elements of a given domain ontology 
are used) in the form of concepts and rules, and improve navigation through these 
patterns. To that end, we analyze three generalization cases (3, V, and q) and 
show their impact on the size of the generalized pattern set. Different scenarios 
of simultaneous generalizations on both objects and attributes are also discussed. 



1 Introduction 

Formal Concept Analysis (FCA) is a formalism for knowledge representation which is 
based on the formalization of "concepts" and "concept hierarchies" [14|. In traditional 
philosophy, a concept is considered to be determined by its extent and its intent. The 
extent contains all entities (e.g., objects, individuals) belonging to the concept while the 
intent includes all properties common to all entities in the extent. The concept hierarchy 
states that "a concept is more general if it contains more entities" and is also called a 
specialization-relation on concepts. FCA lies on the mathematical notions of binary 
relations, Galois connections and ordered structures and has its roots in the philosophy. 
It provides methods to extract and display knowledge from databases and has many 
applications in knowledge representation and management, data mining, and machine 
learning. 

In philosophy, ontology is the study of the categories of things that exist or may 
exist in a specific domain. In computer science, it is an explicit conceptualization of 
a given domain in the form of concepts and their relations (roles), as well as concept 
instances that are linked by relations instantiating generic roles. Roles are usually di- 
rected so that a given role maps the instances of a source concept to those of a target 
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one. Ontology design and utilization are presently gaining an increasing interest with 
the emergence of the Semantic Web ||5], and standardization efforts are progressing in 
the field of ontological languages such as OWL. Many studies were concerned with 
ontology construction, mapping and integration II19I21I . 

In ontology, a concept can be understood as its FCA-intent (attributes), and the 
FCA-entities (objects) as instantiations of concepts. One particular relation between 
concepts represents the is-a hierarchy. This corresponds to the specialization-relation in 
FCA, and provides a taxonomy on the attributes of the domain of interest. The primary 
goal of an ontology is to model the concepts and their relations on a domain of interest, 
whilst FCA aims to discover concepts from a given data set. Within FCA, an interactive 
method for knowledge acquisition called "attribute exploration" has been developed to 
discover and express knowledge from a domain of interest with the help of a domain 
expert [11 .12 13] . This method has been widely used for ontology engineering and 
refinement (see Section[7]i. 

FCA and Ontology both use ordered structures to model or manage knowledge. 
To the best of our knowledge, the work by Cimiano et al. |7| is the first study that 
investigated the possible use of Ontology in FCA by first clustering text documents 
using an ontology and then applying FCA. One recurrent problem in FCA is the huge 
number of concepts that can be derived from a data set since it may be exponential in 
the size of the context. How can we handle this problem? Many techniques have been 
proposed |7| in order to use or produce a taxonomy on attributes or objects to control 
the size of the context and the corresponding concept lattice. Another trend is to query 
pattern bases (e.g., rules and concepts) in a similar way as querying databases ll20l in 
order to display the patterns that are the most relevant to the user 

Patterns are a concise and semantically rich representation of data |6|. These can 

be clusters, concepts, association rules, decision trees, etc In this work we analyze 

some possible ways to abstract (group) objects/attributes together to get generalized pat- 
terns such as generalized itemsets and association rules |27|. The problem we address 
in this paper is the use of taxonomies on attributes or objects to produce and manipulate 
generalized patterns. 

The rest of this contribution is organized as follows. In Section |2] we introduce 
the basic notions of FCA. Section[3]presents different ways to group attributes/objects 
to produce generalized patterns. In Section [4] we discuss line diagrams of generalized 
patterns while in Section|5]the size of the generalized concept set is compared to the size 
of the initial (before generalization) concept set. Some experimental results are shown 
in Section |6] Finally, existing work about combining FCA with Ontology is briefly 
described in Section [T] 

2 Formal Concept Analysis and Data Mining 

2.1 Elementary information systems, contexts and concepts 

The elementary way to encode information is to describe, by means of a relation, that 
some objects have some properties. Figure [T| (left) describes items a, . . . ,h that appear 
in eight transactions of a market basket analysis application. Such a setting defines a 



binary relation / between the set G of objects/transactions and the set M of proper- 
ties/items. The triple {G,M,I) is called a formal context. In Subsection 2.4 we will 
see how to convert data from different formats (many-valued contexts) to binary con- 
texts. When an object g is in relation / with an attribute to, we write {g,m) e I or 
^Im. 

Some interesting patterns are formed by objects sharing the same properties. In data 
mining applications, many techniques are based on the formalization of such patterns, 
namely that of concepts. A concept is defined by its extent (all entities belonging to this 
concept) and its intent (all attributes common to all objects of this concept). 

In a formal context (G, M, I) a formal concept is a pair {A, B) such that B is 
exactly the set of all properties shared by the objects in A and A is the set of all objects 
that have all the properties in B. We set A' :— {m £ M \ aim for all a E A} and 
B' {g e G \ gib for all b e B}. Then {A, B) is a concept of (G, A/, /) iff A' = B 
and B' ~ A. The extent of the concept {A, B) is A and its intent B. We denote by 
53 (G, M, I), Int(G, Af, /) and Ext(G, Af, /) the set of concepts, intents and extents of 
the formal context (G, M, I), respectively. A subset X is closed if X" — X, where 
X" denotes {X')'. Closed subsets of G are exactly extents and closed subsets of Af are 
intents of (G, Af,/). 

In basket market analysis and association rule mining framework, the set G of ob- 
jects is usually the set of transactions (or customers), the set M of attributes is the set 
of bought items (or products) and itemsets are subsets of M. The support of an itemset 
X is defined by suppX := 'j^. Itemsets can be classified with respect to a threshold 
minsupp so that an itemset X is frequent if suppX > minsupp. One advantage of us- 
ing FCA in data mining is that it reduces the computation of frequent itemsets to the 
frequent closed itemsets (i.e. frequent intents) only (see II22I23I31I33I36I ). Note that 
suppX = suppX", and subsets of frequent itemsets are frequent. Then all frequent 
itemsets can be deduced from the close ones. 

There is a hierarchy between concepts stating that a concept ci is more general 
than a concept 02 if its extent is larger than the extent of C2 or equivalently if its intent 
is smaller than the intent of C2. The concept hierarchy is formalized with a relation < 
defined on S(G, M,I)byACC ^=> : {A, B) < (G, D) : B D D. This is an 
order relation, and is also called a specialization/generalization-ielation on concepts. 
In fact, the concept {A, B) is called a specialization of the concept (G, D), or that the 
concept (G, D) is a generalization of the concept (A, B), whenever {A, B) < (G, D) 
holds. 

For any list C of concepts of (G, M, I), there is a concept u of (G, Af , I) that is more 
general than every concept in C and more specific than every concept more general 
than every concept in C (i.e. u is the supremum of C, usually denoted by \l C), and 
there is a concept of {G,M,I) that is a specialization of every concept in C and 
a generalization of every specialization of all concepts in C (i.e. D is the infimum of 
C, also denoted by A Cf\ Then every subset C of Q5(G, Af, f) has an infimum and 
a supremum. Hence, S(G, Af, f) is a complete lattice, called the concept lattice of 
the context (G, Af , I). Recall that a lattice is an algebra {L, A, V) of type (2, 2) such 



If C is a two-element set {Xi, X2}, we write Xi V X2 and Xi A X2 for its supremun and its 
infimum 



that A and V are idempotent, commutative, associative and satisfy the absorption laws: 
X A{x\/ y) — X and xV {x /\y) = x. It is complete if every subset has an infimum and 
a supremum. 

For g E G and m G M we set g' :— {g}' and m' := {m}'. The object con- 
cepts {-fg :— (g" , g'))g£G and the attribute concepts (/im := (m', m")),n(zM form the 
"building blocks" of ©(G, M, /). In fact, every concept of (G, M, I) is a supremum of 
some 7(7's and infimum of some /im'^ Thus, the set {73 | g g G} is \/-dense and the 
set {^m I TO e A/} is /\-dense in Q5(G,M,I). 

The basic theorem on formal concept analysis is given below. 

Theorem 1. [34"j The set of all concepts of a formal context (G, M, I) ordered by the 
specialization/generalization-relation forms a complete lattice, in which infimum and 
supremum are given by 

A {Ak.Bu) = f n f u ) V (^fe'^fe) = f f u ^''l ' n 

k£K \k£K \keK / / keK \ \keK ) keK 

Conversely, a complete lattice L is isomorphic to a concept lattice of a context (G, M, I) 
iff there are maps a : G L and P : M L such that a{G) is \J -dense in L, f3{M) 
is [\-dense in L and glm <^=> o:(g) < (3{m). 

Many research studies in FCA have focused on the design and implementation of ef- 
ficient algorithms for computing the set of concepts. The number of concepts can be 
extremely large, even exponential in the size of the contexj^ So how are such large 
sets of concepts handled? Many techniques have been proposed 1 14|, based on context 
decomposition or lattice pruning/reduction (atlas decomposition, direct or subdirect de- 
composition, iceberg concept lattices, nested line diagrams, . . . ). 



2.2 Labeled line diagrams of concept lattices 

One of the strengths of FCA is the ability to pictorially display knowledge ll35l . at least 
for contexts of reasonable size. Finite concept lattices can be represented by labeled 
Hasse diagrams (see Figure [TJ. Each node represents a concept. The label g is written 
underneath of 73 and to above /im. The extent of a concept represented by a node a 
is given by all labels in G from the node a downwards, and the intent by all labels in 
M from a upwards. For example, the label 5 in the right side of Figure [T] represents 
the object concept 78 = ({5, 6}, {a, c, d}). On the right of the node labeled by 5, there 
is a node with no label (between nodes labeled by 8 and d). It represents the concept 
({6, 8}, {d, c, h]). Diagrams are valuable tools for visualizing data. However drawing a 
good diagram is a big challenge. The concept lattice can be of very large size and have 
a complex structure. Therefore, we need tools to "approximate" the output by reducing 
the size of the input, making the structure nicer or exploring the diagram layer upon 
layer. For the last case, FCA offers nested line diagrams as a means to visualize the 
concepts level-wise. 

2 For {A, B) G <8(G, M, I) we have V{7ff \geA} = (A,B) = /\{iim \ meB}. 
A context of size can have up to 2" concepts. 



Fig. 1. A formal context (left) and a line diagram of its concept lattice (right). a,b, . . . ,h are 
items that appears in transactions 1, . . . , 8. 



Assume that we want to examine a context K :— (G, M, I) where M is a large 
set. We can split AI into two sets A/i and M2 and consider the subcontexts Ki := 
(G,Mi,/i)andK2 := (G, M2, /j), where /i /nGx Mi and /a := /nGx Afs- The 
subsets Ml and M2 need not be disjoint. The only requirement is that Afi U M2 — M. 
The idea is to have a view of the structure restricted to the attributes in AI2, and then 
refine with the attributes in Mi to get the whole view. Therefore, we construct the 
lattices 05 (Ki) and 05 (K2), that are of smaller size than 05 (K), and combine them to 
get 05 (K). The extents of K are exactly the intersections of extents of Ki and K2. We 
first draw a line diagram for 05 (K2) (which corresponds to a rough view), with each 
node large enough to contain a copy of the line diagram of OS(Ki). Afterwards, we 
insert a copy of the line diagram of OS(Ki) in each node of the line diagram of K2 and 
mark on these copies only the nodes that are effectively concepts of K. The constructed 
diagram is called a nested line diagram, and its illustration shown in Figure [5] was 
produced with ToscanaJn 



2.3 Implications and association rules from contexts 

The knowledge extracted from a formal context and its corresponding concept lattice 
can also be displayed in the form of association rules (including implications). Let M 
be a set of properties or attributes. An association rule |2] between attributes in M 
is a pair (Y, Z), denoted by Y^Z where Y is its premise and Z its conclusion. The 
support of a rule Y—*Z is defined by supp(y^Z) := supp(F U Z), and its confidence 
conf(F^Z) := ^'^fupp^^'' ■ A rule Y^Z is a valid implication in a context (G, A/, /) 
if every object having all the attributes in Y also has all the attributes in Z. A rule 
Y^Z is strong in (G, M, I) with respect to the thresholds minsupp and minconf, if 
FUZ is a frequent itemset and supp(y— >Z) > minconf. In Apriori-like algorithms ||2l, 
rule extraction is done in two steps: detection of all frequent itemsets, and utilization 
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of frequent itemsets to generate association rules that have a confidence > minconf. 
While the second step is relatively easy and cost-effective, the first one presents a great 
challenge because the set of frequent itemsets may grow exponentially with the whole 
set of items. One substantial contribution of FCA in association rule mining is that it 
speeds up the computation of frequent itemsets and association rules by concentrating 
only on closed itemsets M22I23I3 1'33'361 and by computing minimal rule sets such as 
Guigues-Duquenne basis ifTsl . Another solution to the problem of the overwhelming 
number of rules is to extract generaUzed association rules using a taxonomy on items 
1271. Before we move to generalized patterns, let us see how data are transformed into 
binary contexts, the suitable format for our data. 

2.4 Information Systems 

Frequently, data are not directly encoded in a "binary" form, but rather as a many- 
valued context in the form of a tuple (G, M, W, I) of sets such that I C G x M x W, 
with {g, m, wi) G / and (g, m, W2) G / imply wi = W2- G is called the set of ob- 
jects, M the set of attributes (or attribute names) and W the set of attribute values. If 
[g, m, w) e /, then w is the value of the attribute m for the object g. Another no- 
tation is rn{g) — w where m is a partial map from G to W. Many-valued contexts 
can be transformed into binary contexts, via conceptual scaling. A conceptual scale for 
an attribute m of {G,M,W,I) is a binary context §,„ :— {Gm, M„i, Im) such that 
m{G) C G„i- Intuitively, A/,„ discretizes or groups the attribute values into m{G), 
and Im describes how each attribute value m{g) is related to the elements in Mm- For 
an attribute m of (G, M, W, I) and a conceptual scale we derive a binary con- 
text Km ■■= {G,Mm,I"^) with gr"sm ■■ m{g)ImSm, where s,„ e Mm- This 
means that an object g E G is in relation with a scaled attribute Sm iff the value of 
m on g is in relation with s„j in S^. With a conceptual scale for each attribute we 
get the derived context := {G,N,I^) where N := [j{Mm \ m e M} and 
gl^Sm '^^=> In practice, the set of objects remains unchanged; each 
attribute name m is replaced by the scaled attributes Sm G Mm- An information system 
is a many-valued context (G, M, W, I) with a set of scales (Sm)mgA/- The choice of a 
suitable set of scales depends on the interpretation, and is usually done with the help of 
a domain expert. A Conceptual Information System is a many-valued context together 
with a set of conceptual scales (called conceptual schema) li26i29J . Other scaling meth- 
ods have also been proposed (see for e.g., 024I25II '). The methods presented in Section|3] 
are actually a form of scaling. 

3 Generalized Patterns 

In the field of data mining, generalized patterns represent pieces of knowledge extracted 
from data when an ontology is used. In this paper, we focus on exploiting generalization 
hierarchies attached to properties (and even objects) to get a lattice with more abstract 
concepts. Producing generalized patterns from concept lattices when a taxonomy on 
attributes is provided can be done in different ways with distinct performance costs that 
depend on the peculiarities of the input (e.g., size, density) and the operations used. 



In the following we formalize the way generalized patterns are produced. Let IK := 
(G, M, I) be a context. The attributes of IK can be grouped together to form another 
set of attributes, namely S, to get a context where the attributes are more general than 
in K. For the basket market analysis example, items/products can be generaUzed into 
product lines and then product categories. The context (G, M, 7) is then replaced with 
a context (G, S, J) as in the scaling process where S can be seen as an index set such 
that {uis \ s G S} covers M. We will usually identify the group with the index s. 

There are mainly three ways to express the binary relation J between the objects of 
G and the (generahzed) attributes of S: 

(3) gJs : <J=^ 3m G s, glm. Consider an information table describing compa- 
nies and their branches in North America. We first set up a context whose objects 
are companies and whose attributes are the cities where these companies have or 
may have branches. If there are too many cities, we can decide to group them into 
provinces (in Canada) or states (in USA) to reduce the number of attributes. Then, 
the (new) set of attributes is now a set S whose elements are states and provinces. It 
is quite natural to state that a company g has a branch in a province/state s if g has 
a branch in a city m which belongs to the province/state s. Formally, g has attribute 
s iff there is m € s such that g has attribute m. 

(V) gJs: <J=^ Vm € s, glm. Consider an information system about Ph.D. students 
and the components of the comprehensive exam (CE). Assume that components 
are: the written part, the oral part, and the thesis proposal, and that a student suc- 
ceeds in his exam if he succeeds in the three components of that exam. The objects 
of the context are Ph.D. students and the attributes are the different exams taken by 
students. If we group together the different components, for example 

CE.written, CE.oral, CE.proposal i— > CE.exam, 

then it becomes natural to state that a student g succeeds in his comprehensive exam 
CE.exam if he succeeds in all the exam parts of CE. i.e g has attribute CE.exam 
if for all m in CE.exam,, g has attribute m. 

(a%) gi s : ■^=> ^^'"^^|[|^"^"''^^ > cts where is a threshold set by the user for the 
generalized attribute s. This case generalizes the (3)-case (a = pgj) and the (V)- 
case {a = 1). To illustrate this case, let us consider a context describing different 
specializations in a given Master degree program. For each program there is a set of 
mandatory courses and a set of optional ones. Moreover, there is a predefined num- 
ber of courses that a student should succeed to get a degree in a given specialization. 
Assume that to get a Master in Computer Science with a specialization in "com- 
putational logic", a student must have seven courses from a set si of mandatory 
courses and three courses from a set S2 of optional ones. Then, we can introduce 
two generalized attributes Si and S2 so that a student g succeeds in the group si if 
he succeeds in at least seven courses from s\, and succeeds in S2 if he succeeds in 
at least three courses from S2. So, := := and 

gUi ^ l/^->l >a..,l<^<2■ 




Fig. 2. An 3-generalization on the attributes of the context in Figure[T] The generahzed attributes 
are A := {e,g}, B :— {b, c}, C := {a, d} and D := {/, h}. The hne diagram of generalized 
patterns (right) is the hne diagram of 23(G, Si, Ji), where Si ;= {A, B, C, D} and Ji is ob- 
tained by an 3-generalization, i.e., the last four columns of Kg . 



Attribute generalization reduces the number of attributes. One may therefore expect 
a reduction of the number of concepts (i.e., |Q5(G, S,J)\ < |*B(G, M, /) |). Unfortu- 
nately, this is not always the case, as we can see from example in Figure[9] Therefore, it 
is interesting to investigate under which condition generalizing patterns leads to a "gen- 
eralized" lattice of smaller size than the initial one (see Section |5]). Moreover, finding 
the connections between the implications and more generally association rules of the 
generalized context and the initial one is also an important problem to be considered. 
As an illustration, the contexts Ka := (G, Af U 5*1, / U Ji) where = {A, B, C, D} 
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Fig. 3. A V-generalization on the attributes of the context in Figure[T] The generalized attributes 
are S := {e, g}, T :— {b, c}, U := {a, d} and V ;= {/, h}. The line diagram of generalized 
patterns (right) is the line diagram of 23(G, 52, J2), where 52 := {S,T, U, V} and J2 is obtained 
by a V-generalization, i.e., the last four columns of Ky. 



(see Figure|2| and Ky (G, M U ^2, / U J2) with 5*2 = {S, T, U, V} (see Figure|3} 
are obtained from the context (G, M, /) shown in Figure [TJwith the same grouping on 
attributes of M, namely A := {e,g} =: S, B := {b,c} T, G := {a,d} =: U and 
D := {/, h} —: V. However, we need different names for the same groups, depending 



on whether they are in 5*1 or in 5*2, since gii{b,c} (which means that gib or g I c, i.e. 
an 3-generahzation) has a meaning different from g J2 {6, c} (which means that gib and 
glc, i.e. a V-generalization). 

If data represent customers (transactions) and items (products), the usage of a taxon- 
omy on attributes leads to new useful patterns that could not be seen before generalizing 
attributes. For example, the 3-case (see Figure |2]| helps the user acquire the following 
knowledge: 

- Customer 3 (at the bottom of the lattice) buys at least one item from each product 
line 

- Whenever a customer buys at least one item from the product line D, then he/she 
buys at least one item from the product line A. 

From the V-case in Figure [3] one may learn for example that Customers 4 and 6 have 
distinct behaviors in the sense that the former buys at least all the items of the product 
lines V and S while the latter purchases at least all the items of the product lines U and 
T. 

To illustrate the a-case, we put the attributes of M in three groups E {a, b, c}, 
F := {d, e, /} and H := {g, h} and set a := 60% for all groups. This a-generalization 
on the attributes of AI is presented in Figure |4] Note that if all groups have two el- 
ements, then any a-generalization would be either an 3-generalization (a < 0.5) or a 
V-generalization (a > 0.5). From the lattice in Figure|4]one can see that any transaction 
involving at least 60% of items in H necessarily includes at least 60% of items in F. 
Moreover, the product line E seems to be the most popular among the four groups since 
five (out of eight) customers bought at least 60% of items in E. 
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Fig. 4. An a-generalization on the attributes of the context in Figure[T] The generahzed attributes 
are E :— {a, b, c}, F := {d, e, /} and H := {g, h}. The line diagram of generahzed patterns 
(right) is the hne diagram of 53(G, 5*3, J3), where S3 :— {E, F, H} and J3 obtained by an 
Q-generahzation with a = 60%, i.e., the last three columns of Kc. 



Generalization can also be conducted on objects to replace some (or all) of them 
with generalized objects. A typical situation would be that of two or more customers 
forming a new group (e.g., a same residence location, a same profile). We can also 
assign to each group all items bought by their members (an 3-generalization) or only 



their common items (a V-generalization), or just some of the frequent items among their 
members (similar to an a-generalization). 

In order to reduce the size of the data to be analyzed, both techniques can apply: 
generalizing attributes and then objects or vice-versa or simultaneously. This can be 
seen as pre-processing data in order to reduce them and then have a more abstract per- 
spective over them. Done simultaneously, i.e., combining generalizations on attributes 
and on objects, will give a kind of hypercontext (similar to hypergraphs [4J), since the 
objects are subsets of G and attributes are subsets of M. Let ^ be a group of objects 
and S be a group of attributes related to a context (G, I). Then, the relation J can be 
defined using one or a combination of the following cases: 

1. Ai B \&3a <E A,3b <^ B such that alh, i.e. some objects from the group A are in 
relation with some attributes in the group B; 

2. J S iff Va S -4, V5 e S a I &, i.e. every object in the group A is in relation with 
every attribute in the group B; 

3. AiB iff Va e ^, 3& e S such that a 1 6, i.e. every object in the group A has at least 
one attribute from the group B; 

4. AiBiff3h ^ B such that Va G ^ a 1 6, i.e. there is an attribute in the group B that 
belongs to all objects of the group A; 

5. AiB iff V6 e 3a G ^ such that a 1 6, i.e. every property in the group B is 
satisfied by at least one object of the group A; 

6. AiB\S3a ^ A such that V6 € B alh, there is an object in the group A that has 
all the attributes in the group B; 

7. AiB iff i^^j > q;_4, i.e. at least of objects in the group A 

have each at least /3g% of the attributes in the group B; 

I |{ag.A|aIb} 

> i.e. at least of attributes in the group 



8. ^JBiff 

B belong altogether to at least aA% of objects in the group A; 

9. AiB iff ^"^_4xB|^^ — ^-^^ '■^^ density of the rectangle x S is at least equal to a. 

Remark 1. The cases 7 and 8 generahze Case 1 (take q;_4 := /3e '■— for all 
A and B), Case 2 (take :— 1, (is '■= 1 for all A and B). Moreover Case 7 also 
generalizes Case 3 (take :— 1, (3b '■— pgy for all A and B) and Case 5 (take 
<^A ■= Pb '■= 1 for all A and B). However, Cases 4 and 6 cannot be captured by 
Case 7, but are captured by Case 8 (take aj^ := 1, /3h ■— for all A and B to get 
Case 4, and take ay^ :— (3^ 1 for all A and B to get Case 6). 

In most cases, a taxonomy is provided either implicitly or explicitly. Let O be an ontol- 
ogy on a domain T). We denote by C the concepts of O and by T a taxonomy induced by 
the is-a hierarchy of O. Then, T is a quasi-order since two concepts can be equivalent 
(but not identical in the domain). We can assume that T is a complete lattice by taking 
the Dedekind-MacNeille completion of its quotient with respect to the quasi-order. Let 
(G, M, /) be a context such that the attributes in M are represented by some concepts 
in C. If only some attributes of (G, Af, /) are represented in C, then we replace T by 



(T U /xM, <T U <m)- The attributes in M then appear in T at some level. An 3- 
generalization is simulated by going one or more levels upward in the taxonomy and a 
V-generalization is obtained by going one or more levels downward in T. How many 
levels should the user follow to get the knowledge he is expecting? 

We consider for example a data mining context (G, Af, /), where G is the set of 
transactions and M the set of items. With an 3-generalization, some items that were non 
frequent can become frequent. One possibility is to keep the items (attributes in M) that 
are frequent and put the non frequent ones in groups (according to a certain semantics) 
so that at least a certain percentage of transactions contains at least one object from 
each group. This can be done through an interactive program which suggests some 
groupings to the user for validation and feedback. If no taxonomy is provided, one may 
be interested or forced to derive a taxonomy from data, that will be used afterwards to 
get generalized patterns. How can this be achieved? 

4 Visualizing generalized patterns on line diagrams 

4.1 Visualization 

Let (G, M, /) be a formal context and (G, S., J) a context obtained from (G, M, /) via 
a generalization on attributes. The usual action is to directly construct a line diagram of 
(G, 5, J) which contains concepts with generalized attributes. (See Figures |2j [3] and |4]|. 
However, one may be interested, after getting (G, S, J) and constructing a line diagram 
for Q3(G, 5, J), to refine further on the attributes in M or recover the lattice constructed 
from {G,M,I). 

When storage space is not a constraint, then the attributes in M and the generalized 
attributes can be kept altogether. This is done using an apposition of (G, M, I) and 
(G, S, J) to get (G, M U S, I U J). A nested line diagram can be used to display the 
resulting lattice, with (G, S, J) as first level and (G, M, I) as second level; i.e. we 
construct a line diagram for !B(G, S, J) with nodes large enough to contain copies 
of the Une diagram of 5B(G, M, /). Figure |5] displays the nested line diagram of the 
context in Figure [3] with the generalized attributes S, T, U, V at the first level and the 
attributes in a, . . . , /i at the inner one. The generalized patterns can also be visualized by 
conducting a projection (i.e., a restricted view) on generaUzed attributes, and keeping 
track of the effects of the projection, i.e, we display the projection of the concept lattice 
S(G, M U S*, / U J) on 5 by marking the equivalence classes on *8(G, MUS,IUJ). 
Note that two concepts {A, B) and (G, D) are equivalent with respect to the projection 
on iff _B n 5 = D n S (i.e. their intents have the same restriction on 5*). This is 
illustrated by Figure |6] 

4.2 Are generalized attributes really generalizations? 

Let us have a close look at the concept lattice *8 ( G , A/ U S", / U J) . Recall that a concept 
u is more general than a concept o, if u contains more objects than 0. That is, o < u, or 
ext(t)) C ext(u), or int(u) C int(ti). We also state that u is a generalization of 0, and a 
specialization of u. For two attributes a and 6 in M U 5, we should normally assert that 




Fig. 5. A nested line diagram of the context shown in Figure |3] (left). A zoom of the 
rightmost large node (right) gives additional information about objects 7 and 8, by 
showing that the second one is a specialization of the first object. 

a is a generalization of 6 or 6 is a specialization of a whenever /ia is a generalization of 
nb. Now, let us have a close look at the three cases of attribute generalization. 

In the 3-case (see the left hand-side of Figure [7]), an object 5 e G is in relation 
with an attribute iff there is ni e TOs such that glm. Thus m'^ = [J{m' \ m e rris} 
and fiiTis = \/{ixm \ m G m^}. Therefore, every 3-generalized attribute satisfies 
Unis > fJ.m for all m S m^, and deserves the name of a generalization of the attributes 
m's, m G rris. 

In the V-case (see the right hand-side of Figure |7]l, an object g e G is in relation 
with an attribute rris iff glm for all m G iris- Thus = Pll^'^' I ^ ™s} and 
/XTTis = /\{lJ.rn I m € ras}. Therefore, every V-generalized attribute rag satisfies 
firus < for all m e m^, and should normally be called a specialization of the 
attributes m's, m e m^. 

In the a-case, j-p^ < a < 1, an object g e G is in relation with an attribute to^ iff 

a < \i"^^^^\9'^"^}\ xhe following situations can happen: 

- There is an a-generalized attribute nis E S with at least one attribute m G nis such 
that g /fm and ginis', hence /im ^ /im^ in Q5(G, M U S, I U J); i.e /im^ is not a 
generalization of //to, and by then not a generalization of the fim's, to G TOs. 

- There is an a-generalized attribute nis G S with at least one attribute to G nis such 
that glm and 5 /to^; hence /iTOs ^ /tm in 5B(G, M U S, I U J); i.e /ito^ is not a 
specialization of /j,to, and by then not a specialization of the fjim's, m G mg. 



Fig. 6. Projection of the context shown in Figure[3]onto the V-generahzation attributes. 
This is equivalent to the line diagram shown in Figure |3] 

Therefore, there are a-generalized attributes mg that are neither a generalization of the 
m's nor a specialization of the m's. In Figure |8] the element b belongs to the group 
E, but nE is neither a specialization nor a generalization of ^ib, since /i6 ^ fj,E and 
HE ^ Thus, we should better call the a-case an attribute approximation, the V-case 
a specialization and only the 3-case a generalization. 

5 Controlling the size of generalized concepts 

A generalized concept is a concept whose intent (or extent) contains generalized at- 
tributes (or objects). Let us first introduce the example in Figure |9] in which a 3- 
generalization leads to a generalized concept set larger than the number of initial con- 
cepts. The two concepts /imi and /im2 will be put together. Although we discard the 
attributes mi and m2, the nodes 7172 and 753 will remain since they will be obtained as 
/imi2 A /im4 and /imi2 A fim^ respectively. Then we get the configuration on Figure|9] 
(right) which has one concept more than the initial concept lattice shown in the left of 
the same figure. 

In the following, we analyze the impact of 3 and V attribute generaUzations on the 
size of the resulting set of generalized concepts. 

5.1 An 3-generalization on attributes 

Let (G, A/, /) be a context and (G, J) a context obtained from an 3-generalization 
on attributes, i.e the elements of S are groups of attributes from M. We set S = {m^ | 
s S S}, with rus C M. Then, an object g G G is in relation with a generalized 
attribute rris if there is an attribute m in rUg such that glm. To compare the size of the 
corresponding concept lattices, we can define some mappings. We assume that {ms)s&s 
forms a partition of M. Then for each m G AI there is a unique generalized attribute 
nis such that m G nis, and glm implies gimg, for every g e G. To distinguish 



Fig. 7. An 3-generalization is a generalization (left) and a V-generaUzation is a special- 
ization (right). 



between derivations in (G, M, /) and in (G, S, J), we will replace ' by the name of the 
corresponding relation. For example = {m G M \ glm} and — {s G S \ g J s}. 
Two canonical maps a and /? are defined as follows: 

a:G~^'B{G,S,J) ^ (3: M ^ ^{G,S,J) 

T T T and _ / T T T\ 

5 I— > 75 := (g , g ) rrn—> fJ^mg :— (s , s ) , where m € 

The maps a and f3 induce two order preserving maps ip and ip (see 1141 ) defined by 

f:'S{G,M,I)^^{G,S,J) ■.'B{G,M,I) ~^^{G,S,J) 

(A, B) ^ \J{ag \ g e A] {A, B) ^ /\{(3m | m e 5} 

If Lp or ijj is surjective, then the generalized context is of smaller cardinality. As we have 
seen on Figure[9]these maps can be both not surjective. Obviously ip{A^ B) < ipiA, B) 
since g\ra implies g J iris and jg < fifhs. When do we have the equality? Does the 
equality imply surjectivity? 

Now we present some special cases where the number of concepts does not increase 
after a generalization. 

Case 1 Every mg has a greatest element T s- Then the context (G, 5, J) is a projection 
of (G, M, I) on the set Mg :— {T s \ s £ S} of greatest elements of Wg. Thus 
^(G, 5, J) ^ S(G, Ms, /n (G X Ms)) and is a sub-order of Q5(G, M, /). Hence 
mG,S,J)\ - |»(G,Afs,/nG X Ms) I < |'B(G,M,/)|. 

Case 2 The union lj{m^ | me TOs} is an extent, for any G S. Then any grouping 
does not produce a new concept. Hence the number of concepts cannot increase. 

The following result (Theorem |2]i gives an important class of lattices for which the 
3-generalization does not increase the size of the lattice. We recall that a lattice L is 



Fig. 8. An a-generalization on the attributes of the context in Figure [T] that is neither 
a generalization nor a specialization! The generaUzed attributes are E :— {a, b, c}, 
F := {d, e, /} and H := {g, h}. We take a = 60%. The a-generaUzed concept fiE is 
neither a specialization nor a generalization of the concept 



distributive if for x, y and z in L, we have x /\ {y V z) — {x /\ y) V {x /\ z). A context 
is object reduced if no row can be obtained as the intersection of some other rows. 

Theorem 2. The 3- generalizations on distributive concept lattices whose contexts are 
object reduced decrease the size of the concept lattice. 

Proof. Let (G, M, /) be an object reduced context such that ?B(G, A/, /) is a distribu- 
tive lattice. Let (G, S", J) be a context obtained by an 3-generalization on the attributes 
in M. Let nris be a generalized attribute, i.e. a group of attributes of M. It is enough to 
prove that mj. is an extent of (G, A/, /). By definition, we have 

m'l = I iTi e m^} C ^(^{ti^ I m € rris}^ = ext(^{/im | m G nis}) 

Let g E ext{\/{iim \ m E mg}). We have jg < \l{iim \ m E mg}. Thus 

73 — 'ymA\J {iJ.m \ m E rUg} — \J {'yg/\fim \ m E TOs} — '^gAfim for some m E nis 

Therefore •yg < fim, and g E . This proves that ext(\/{/^™ | m E mg}) C , and 
rrig — ext(\/{/i"^ | rn E mg}). 

Remark 2. The above discussed cases are not the only ones where the size does not 
increase. For example if we conduct the groupings of attributes one after another, and 
each intermediate state does not increase the size of the lattice, or the overall number of 
new concepts is less than the deleted concepts in the whole process, then the lattice of 
generalized concepts is of smaller size (see the empirical study in Section|6]l. 



Fig. 9. The concept lattice on the right is obtained from the concept lattice on the left by 
an 3-generalization on attributes that put mi and m2 together to get mi2. The number 
of concepts has increased. 

5.2 A V-generalization on attributes 

Let (G, S, J) be a context obtained from (G, M, I) by a V-generalization. In the context 
(G, M U S, I U J), each attribute concept is reducible. This means that ml = 
f]{m^ I m e rris} ~ f]{m^ \ m G nis}, and is an extent of {G,M,I). Therefore, 
|»(G,5, J)| < |«B(G,AfUS',/U J)| = |«B(G,M,/)|. 

Theorem 3. The y -generalizations on attributes reduce tlie size of the concept lattice. 

6 Experimentation 

We conducted our experimentation over 100 synthetic contexts with various sizes. The 
number of objects ranges from 50 to 10 000 instances and the number of attributes 
ranges from 25 to 150 elements. The number of concepts of the generated contexts 
ranges from 70 thousands to 850 millions concepts. Obviously, producing and display- 
ing such a huge set of concepts is very time-consuming and even impossible. In our 
experiments, the fanout, i.e. the number of simple attributes per generalized attribute, 
varies from 2 to 20 and was simulated by grouping randomly the attributes two by two, 
three by three and so on. For each fanout value and for each context, the new gener- 
alized context is computed and the number of generalized concepts is calculated using 
Concept Explore:]^ to compute the number of generalized concepts. We summarize the 
results of the experimentation in the figures below. In Figure [TOj we can see that the 
generalization process does not only reduce the context size but can also considerably 
reduce the size of the corresponding lattice. Moreover, the number of generalized con- 
cepts is almost inversely proportional to the fanout. However, one can see from Figure 



^ http://conexp.sourceforge.net 



(a) Contexts: 10 000 objetcs2S attributes 




10 15 20 



(c) Contexts: SCO objects^ 1S0 attiibutes 




(b) Contexts: 1000 objects, SO attributes 



1 800000 
1 600000 
1 400000 
1 200000 
1 000000 
800000 
600000 
400000 
200000 




6 7 
Fan out 



10 15 20 



(d) Contexts 200 objetcs, SO attributes 



20000 




X 



4 5 6 7 
Fanout 



9 10 15 20 



Fig. 10. Summarization of experiments on different synthetic contexts 




Fig. 11. Summarization of the gain (i.e. size reduction) obtained on different synthetic contexts. 



[TOl-(b) and (d) that when the fanout is equal to 2, then the number of generahzed con- 
cepts can be greater than the number of original concepts. Figure 1 1 summarizes the 
lattice reduction as a ratio between the number of original concepts and the number of 
generalized ones. We can notice in Figure [TT[-(b) that the reduction is neither linear nor 
proportional to the fanout but can be very significant. Indeed, with an attribute grouping 
of size 10 a ratio of 37722 is obtained. This means that the size of the original concept 
set is almost forty thousands times the number of generalized concepts, and hence there 
is a significant reduction in the size of the generahzed lattice. 



7 Related work 



There are a set of studies r3'7'8'9 llOll5ll7l30l32ll about the possible collaborations be- 
tween formal concept analysis and ontology engineering (e.g., ontology merging and 
mapping) to let the two formalisms benefit from each other strengths. Starting from the 
fact that both domain ontologies and FCA aim at modeling concepts, |7| show how 
FCA can be exploited to support ontology engineering (e.g., ontology construction and 
exploration), and conversely how ontologies can be fruitfully used in FCA applications 
(e.g., extracting new knowledge). In [|30l . the authors propose a bottom-up approach 
called FCA — MERGE for merging ontologies using a set of documents as input. 
The method relies on techniques from natural language processing and FCA to produce 
a lattice of concepts. The approach has three steps: (i) the linguistic analysis of the input 
which returns two formal contexts, (ii) the merge of the two contexts and the computa- 
tion of the pruned concept lattice, and (iii) the semi-automatic ontology creation phase 
which relies partially on the user's interaction. The two formal contexts produced at 
Step 1 are of the form := {D, Ali, Ii) where i e {1, 2}, I? is a set of documents. 
Mi is the set of concepts of Ontology i found in D, and Ii is a binary relation between 
D and Mi. Starting from a set of domain specific texts, 1 15] proposes a semi-automatic 
method for ontology extraction and design based on FCA and Horn clause model. LlOl 
studies the role of FCA in reusing independently developed domain ontologies. To that 
end, an ontology-based method for evaluating similarity between FCA concepts is de- 
fined to perform some Semantic Web activities such as ontology merging and ontology 
mapping. In ||32l an approach towards the construction of a domain ontology using FCA 
is proposed. The resulting ontology is represented as a concept lattice and expressed via 
the Semantic Web Rule Language (S WRL) to facilitate ontology sharing and reasoning. 

Ontology mapping ||T9| is seen as one of the key techniques for data integration (and 
mediation) between databases with different ontologies. In |9|, a method for ontology 
mapping, called FCA-Mapping, is defined based on FCA and allows the identification 
of equal and subclass mapping relations. In HI, FCA is also used to propose an ontology 
mediation method for ontology merging. The resulting ontology includes new concepts 
not originally found in the input ontologies but excludes some redundant or irrelevant 
concepts. 

Since ontologies describe concepts and relations between them, 1 16] have handled 
the problem of mining relational data sets in the framework of FCA and proposed an 
extension to FCA called relational concept analysis. Relational data sets are collections 



in which objects are described both by their own attributes/properties and by their links 
with other objects. 

In the general field of association rule mining, there are many efforts to integrate 
knowledge in the process of rule extraction to produce generalized patterns [28 1. For 
example, HI uses a domain ontology, including relations between concepts, to discover 
generaUzed sequential patterns. 

8 Conclusion 

In this paper we have studied the problem of using a taxonomy on objects and/or at- 
tributes in the framework of formal concept analysis under three main cases of general- 
ization (3, V, and a) and have shown that (i) the set of generalized concepts is generally 
smaller than the set of patterns extracted from the original set of attributes (before gen- 
eralization), and (ii) the generalized concept lattice not only embeds new patterns on 
generalized attributes but also reveals particular features of objects and may unveil a 
new taxonomy on objects. A careful analysis of the three cases of attribute generaliza- 
tion led to the following conclusion: the a-case is an attribute approximation, the V-case 
is an attribute specialization while only the 3-case is actually an attribute generaliza- 
tion. Different scenarios of a simultaneous generalization on objects and attributes are 
also discussed based on the three cases of generalization. 

Since we focused our analysis on the integration of taxonomies in FCA to produce 
generalized concepts, our further research concerns the theoretical study of the mapping 
between a rule set on original attributes and a rule set of generalized attributes as well as 
the exploitation of other components of a domain ontology such as general links (other 
than is-a hierarchies) between generic concepts or their instances. 
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